CHAPTER 12 Comparing Proportions and Analyzing Cross-Tabulations 161
»
» Most statistical software is also set up so that you can do these tests using
summarized data (rather than individual-level data), so long as you set an
option in your programming when running the tests. In contrast, online
calculators that execute these tests expect you to have already cross-tabulated
the data. These calculators usually present a screen showing an empty table,
and you enter the counts into the table’s cells to run the calculation.
Examining Two Variables with the
Pearson Chi-Square Test
The most commonly used statistical test of association between two categorical
variables is called the chi-square test of association developed by Karl Pearson
around the year 1900. It’s called the chi-square test because it involves calculating
a number called a test statistic that fluctuates in accordance with the chi-square
distribution. Many other statistical tests also use the chi-square distribution, but
the test of association is by far the most popular. In this book, whenever we refer
to a chi-square test without specifying which one, we are referring to the Pearson
chi-square test of association between two categorical variables. (Please note that
some books use the notation X2 or x2 instead of saying the term chi-square.)
Understanding how the
chi-square test works
You don’t have to understand the equations behind the chi-square test if you have
a computer to do them, which is optimal, though it is possible to calculate the test
manually. This means you technically don’t have to read this section. But we
encourage you to do so anyway, because we think you’ll have a better appreciation
for the strengths and limitations of the test if you know its mathematical under-
pinnings. Here, we walk you through conducting a chi-square test manually
(which is possible to do in Microsoft Excel).
Calculating observed and expected counts
All statistical significance tests start with a null hypothesis (H0) that asserts that no
real effect is present in the population, and any effect you think you see in your
sample is due only to random fluctuations. (See Chapter 3 for more information.)
The H0 for the chi-square test asserts that there’s no association between the
levels of the row variable and the levels of the column variable, so you should
expect the relative spread of cell counts across the columns to be the same for
each row.